目的:基于知识的计划(KBP)通常涉及培训端到端深度学习模型以预测剂量分布。但是,由于经常使用的医疗数据集规模有限,端到端方法可能与实际限制有关。为了解决这些局限性,我们提出了一种基于内容的图像检索(CBIR)方法,用于根据解剖学相似性检索先前计划的患者的剂量分布。方法:我们提出的CBIR方法训练一种代表模型,该模型可产生患者解剖信息的潜在空间嵌入。然后将新患者的潜在空间嵌入与数据库中以前患者的潜在空间嵌入,以检索剂量分布的图像。该项目的所有源代码均可在GitHub上获得。主要结果:在由我们机构的公开计划和临床计划组成的数据集上评估了各种CBIR方法的检索性能。这项研究比较了各种编码方法,从简单的自动编码器到Simsiam等最新的暹罗网络,并且在Multipask Siamese网络中观察到了最佳性能。意义:应用CBIR告知后续的治疗计划可能会解决与端到端KBP相关的许多限制。我们目前的结果表明,可以通过对先前开发的暹罗网络进行轻微更改来获得出色的图像检索性能。我们希望通过Metaplanner框架等方法将CBIR集成到未来工作中的自动化计划工作流程中。
translated by 谷歌翻译
Agents that can follow language instructions are expected to be useful in a variety of situations such as navigation. However, training neural network-based agents requires numerous paired trajectories and languages. This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks. The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data through the representation. Key challenges in applying the models to sequence-to-sequence tasks including instruction following are learning a shared representation of variable-length mulitimodal data and incorporating attention mechanisms. To address the problems, this paper proposes a novel network architecture to absorb the difference in the sequence lengths of the multimodal data. In addition, to further improve the performance, this paper shows how to incorporate the generative model-based approach with an existing semi-supervised method called a speaker-follower model, and proposes a regularization term that improves inference using unpaired trajectories. Experiments on BabyAI and Room-to-Room (R2R) environments show that the proposed method improves the performance of instruction following by leveraging unpaired data, and improves the performance of the speaker-follower model by 2\% to 4\% in R2R.
translated by 谷歌翻译
A polarization camera has great potential for 3D reconstruction since the angle of polarization (AoP) and the degree of polarization (DoP) of reflected light are related to an object's surface normal. In this paper, we propose a novel 3D reconstruction method called Polarimetric Multi-View Inverse Rendering (Polarimetric MVIR) that effectively exploits geometric, photometric, and polarimetric cues extracted from input multi-view color-polarization images. We first estimate camera poses and an initial 3D model by geometric reconstruction with a standard structure-from-motion and multi-view stereo pipeline. We then refine the initial model by optimizing photometric rendering errors and polarimetric errors using multi-view RGB, AoP, and DoP images, where we propose a novel polarimetric cost function that enables an effective constraint on the estimated surface normal of each vertex, while considering four possible ambiguous azimuth angles revealed from the AoP measurement. The weight for the polarimetric cost is effectively determined based on the DoP measurement, which is regarded as the reliability of polarimetric information. Experimental results using both synthetic and real data demonstrate that our Polarimetric MVIR can reconstruct a detailed 3D shape without assuming a specific surface material and lighting condition.
translated by 谷歌翻译
Diagnostic radiologists need artificial intelligence (AI) for medical imaging, but access to medical images required for training in AI has become increasingly restrictive. To release and use medical images, we need an algorithm that can simultaneously protect privacy and preserve pathologies in medical images. To develop such an algorithm, here, we propose DP-GLOW, a hybrid of a local differential privacy (LDP) algorithm and one of the flow-based deep generative models (GLOW). By applying a GLOW model, we disentangle the pixelwise correlation of images, which makes it difficult to protect privacy with straightforward LDP algorithms for images. Specifically, we map images onto the latent vector of the GLOW model, each element of which follows an independent normal distribution, and we apply the Laplace mechanism to the latent vector. Moreover, we applied DP-GLOW to chest X-ray images to generate LDP images while preserving pathologies.
translated by 谷歌翻译
This study targets the mixed-integer black-box optimization (MI-BBO) problem where continuous and integer variables should be optimized simultaneously. The CMA-ES, our focus in this study, is a population-based stochastic search method that samples solution candidates from a multivariate Gaussian distribution (MGD), which shows excellent performance in continuous BBO. The parameters of MGD, mean and (co)variance, are updated based on the evaluation value of candidate solutions in the CMA-ES. If the CMA-ES is applied to the MI-BBO with straightforward discretization, however, the variance corresponding to the integer variables becomes much smaller than the granularity of the discretization before reaching the optimal solution, which leads to the stagnation of the optimization. In particular, when binary variables are included in the problem, this stagnation more likely occurs because the granularity of the discretization becomes wider, and the existing modification to the CMA-ES does not address this stagnation. To overcome these limitations, we propose a simple extension of the CMA-ES based on lower-bounding the marginal probabilities associated with the generation of integer variables in the MGD. The numerical experiments on the MI-BBO benchmark problems demonstrate the efficiency and robustness of the proposed method. Furthermore, in order to demonstrate the generality of the idea of the proposed method, in addition to the single-objective optimization case, we incorporate it into multi-objective CMA-ES and verify its performance on bi-objective mixed-integer benchmark problems.
translated by 谷歌翻译
Text-to-speech synthesis (TTS) is a task to convert texts into speech. Two of the factors that have been driving TTS are the advancements of probabilistic models and latent representation learning. We propose a TTS method based on latent variable conversion using a diffusion probabilistic model and the variational autoencoder (VAE). In our TTS method, we use a waveform model based on VAE, a diffusion model that predicts the distribution of latent variables in the waveform model from texts, and an alignment model that learns alignments between the text and speech latent sequences. Our method integrates diffusion with VAE by modeling both mean and variance parameters with diffusion, where the target distribution is determined by approximation from VAE. This latent variable conversion framework potentially enables us to flexibly incorporate various latent feature extractors. Our experiments show that our method is robust to linguistic labels with poor orthography and alignment errors.
translated by 谷歌翻译
End-to-end text-to-speech synthesis (TTS) can generate highly natural synthetic speech from raw text. However, rendering the correct pitch accents is still a challenging problem for end-to-end TTS. To tackle the challenge of rendering correct pitch accent in Japanese end-to-end TTS, we adopt PnG~BERT, a self-supervised pretrained model in the character and phoneme domain for TTS. We investigate the effects of features captured by PnG~BERT on Japanese TTS by modifying the fine-tuning condition to determine the conditions helpful inferring pitch accents. We manipulate content of PnG~BERT features from being text-oriented to speech-oriented by changing the number of fine-tuned layers during TTS. In addition, we teach PnG~BERT pitch accent information by fine-tuning with tone prediction as an additional downstream task. Our experimental results show that the features of PnG~BERT captured by pretraining contain information helpful inferring pitch accent, and PnG~BERT outperforms baseline Tacotron on accent correctness in a listening test.
translated by 谷歌翻译
Edema is a common symptom of kidney disease, and quantitative measurement of edema is desired. This paper presents a method to estimate the degree of edema from facial images taken before and after dialysis of renal failure patients. As tasks to estimate the degree of edema, we perform pre- and post-dialysis classification and body weight prediction. We develop a multi-patient pre-training framework for acquiring knowledge of edema and transfer the pre-trained model to a model for each patient. For effective pre-training, we propose a novel contrastive representation learning, called weight-aware supervised momentum contrast (WeightSupMoCo). WeightSupMoCo aims to make feature representations of facial images closer in similarity of patient weight when the pre- and post-dialysis labels are the same. Experimental results show that our pre-training approach improves the accuracy of pre- and post-dialysis classification by 15.1% and reduces the mean absolute error of weight prediction by 0.243 kg compared with training from scratch. The proposed method accurately estimate the degree of edema from facial images; our edema estimation system could thus be beneficial to dialysis patients.
translated by 谷歌翻译
Peripheral blood oxygen saturation (SpO2), an indicator of oxygen levels in the blood, is one of the most important physiological parameters. Although SpO2 is usually measured using a pulse oximeter, non-contact SpO2 estimation methods from facial or hand videos have been attracting attention in recent years. In this paper, we propose an SpO2 estimation method from facial videos based on convolutional neural networks (CNN). Our method constructs CNN models that consider the direct current (DC) and alternating current (AC) components extracted from the RGB signals of facial videos, which are important in the principle of SpO2 estimation. Specifically, we extract the DC and AC components from the spatio-temporal map using filtering processes and train CNN models to predict SpO2 from these components. We also propose an end-to-end model that predicts SpO2 directly from the spatio-temporal map by extracting the DC and AC components via convolutional layers. Experiments using facial videos and SpO2 data from 50 subjects demonstrate that the proposed method achieves a better estimation performance than current state-of-the-art SpO2 estimation methods.
translated by 谷歌翻译
Off-policy evaluation (OPE) attempts to predict the performance of counterfactual policies using log data from a different policy. We extend its applicability by developing an OPE method for a class of both full support and deficient support logging policies in contextual-bandit settings. This class includes deterministic bandit (such as Upper Confidence Bound) as well as deterministic decision-making based on supervised and unsupervised learning. We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases. We validate our method with experiments on partly and entirely deterministic logging policies. Finally, we apply it to evaluate coupon targeting policies by a major online platform and show how to improve the existing policy.
translated by 谷歌翻译